Conversation
When _prune_old_tool_results truncates a large tool_call arguments
string, it was byte-slicing to 200 chars and appending the literal
sentinel "...[truncated]". The resulting string is not valid JSON
(unterminated mid-value), and some providers — notably Anthropic via
LiteLLM — re-parse function.arguments when rebuilding tool_use
blocks. The next API call fails with HTTP 400 ("Unterminated string
starting at: line 1 column 35"), and every subsequent turn replays
the same corrupted history, so the session is permanently stuck.
Observed in the wild: a session that compressed with 11 prior patch
tool_calls (each >500 chars) then returned HTTP 400 on every inbound
message. The 3 retries are identical payloads, so they all fail the
same way. The user sees only a canned error fallback.
Replace the byte slice with a compact JSON sentinel that preserves
provenance (original length + 200-char preview) while staying
parseable.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
handsdiff
pushed a commit
that referenced
this pull request
Apr 18, 2026
…ts (NousResearch#11745) Move moonshotai/kimi-k2.5 to position #1 in every model picker list: - OPENROUTER_MODELS (with 'recommended' tag) - _PROVIDER_MODELS: nous, kimi-coding, opencode-zen, opencode-go, alibaba, huggingface - _model_flow_kimi() Coding Plan model list in main.py kimi-coding-cn and moonshot lists already had kimi-k2.5 first.
handsdiff
pushed a commit
that referenced
this pull request
Apr 21, 2026
When the live Vercel AI Gateway catalog exposes a Moonshot model with zero input AND output pricing, it's promoted to position #1 as the recommended default — even if the exact ID isn't in the curated AI_GATEWAY_MODELS list. This enables dynamic discovery of new free Moonshot variants without requiring a PR to update curation. Paid Moonshot models are unaffected; falls back to the normal curated recommended tag when no free Moonshot is live.
handsdiff
pushed a commit
that referenced
this pull request
Apr 21, 2026
…#13354) Classic-CLI /steer typed during an active agent run was queued through self._pending_input alongside ordinary user input. process_loop, which drains that queue, is blocked inside self.chat() for the entire run, so the queued command was not pulled until AFTER _agent_running had flipped back to False — at which point process_command() took the idle fallback ("No agent running; queued as next turn") and delivered the steer as an ordinary next-turn user message. From Utku's bug report on PR NousResearch#13205: mid-run /steer arrived minutes later at the end of the turn as a /queue-style message, completely defeating its purpose. Fix: add _should_handle_steer_command_inline() gating — when _agent_running is True and the user typed /steer, dispatch process_command(text) directly from the prompt_toolkit Enter handler on the UI thread instead of queueing. This mirrors the existing _should_handle_model_command_inline() pattern for /model and is safe because agent.steer() is thread-safe (uses _pending_steer_lock, no prompt_toolkit state mutation, instant return). No changes to the idle-path behavior: /steer typed with no active agent still takes the normal queue-and-drain route so the fallback "No agent running; queued as next turn" message is preserved. Validation: - 7 new unit tests in tests/cli/test_cli_steer_busy_path.py covering the detector, dispatch path, and idle-path control behavior. - All 21 existing tests in tests/run_agent/test_steer.py still pass. - Live PTY end-to-end test with real agent + real openrouter model: 22:36:22 API call #1 (model requested execute_code) 22:36:26 ENTER FIRED: agent_running=True, text='/steer ...' 22:36:26 INLINE STEER DISPATCH fired 22:36:43 agent.log: 'Delivered /steer to agent after tool batch' 22:36:44 API call NousResearch#2 included the steer; response contained marker Same test on the tip of main without this fix shows the steer landing as a new user turn ~20s after the run ended.
handsdiff
pushed a commit
that referenced
this pull request
Apr 21, 2026
Previously the breaker was only cleared when the post-reconnect retry call itself succeeded (via _reset_server_error at the end of the try block). If OAuth recovery succeeded but the retry call happened to fail for a different reason, control fell through to the needs_reauth path which called _bump_server_error — adding to an already-tripped count instead of the fresh count the reconnect justified. With fix #1 in place this would still self-heal on the next cooldown, but we should not pay a 60s stall when we already have positive evidence the server is viable. Move _reset_server_error(server_name) up to immediately after the reconnect-and-ready-wait block, before the retry_call. The subsequent retry still goes through _bump_server_error on failure, so a genuinely broken server re-trips the breaker as normal — but the retry starts from a clean count (1 after a failure), not a stale one.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
context_compressor.pyPass 3 truncates large tool_callargumentsstrings by byte-slicing to 200 chars and appending the literal sentinel...[truncated]. The result is invalid JSON — mid-value, unterminated string.Some providers (notably Anthropic via LiteLLM) re-parse
function.argumentswhen rebuildingtool_useblocks. Once a session is compressed with any tool_call whose original args exceeded 500 chars (routine forpatch), the next API call fails with:The 3 retries are identical payloads, so they all fail identically, and every subsequent user turn replays the corrupted history. The session is permanently stuck and the user sees only a canned error fallback.
Fix
Replace the byte slice with a compact JSON sentinel that preserves provenance (original length + 200-char preview) while staying parseable:
{"_truncated": true, "_original_length": 1247, "_preview": "..."}Observed in the wild
A production session had 11 prior
patchtool_calls poisoned by this truncation after compression. Every subsequent user message triggered a new failure cycle — user saw 3 cryptic LiteLLM 400 error strings back-to-back with no way to recover (the poison is in server-side history).Test plan
test_pruned_tool_call_args_remain_valid_jsonthat asserts the pruned arguments are parseable viajson.loadsand carry a sensible sentinel shape.TestTokenBudgetTailProtectiontests still pass.